Zeileis Danger : High Power ! – Exploring the Statistical Properties of a Test for Random Forest Variable
نویسندگان
چکیده
Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., in genetics and bioinformatics. We highlight both advantages and limitations of different variable importance scores and associated testing procedures, especially in the context of correlated predictor variables. For the test of Breiman and Cutler (2008), we investigate the statistical properties and find that the power of the test depends both on the sample size and the number of trees, an arbitrarily chosen tuning parameter, leading to undesired results that nullify any significance judgments. Moreover, the specification of the null hypothesis of this test is discussed in the context of correlated predictor
منابع مشابه
Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance
Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., in genetics and bioinformatics. We highlight both advantages and limitations of different variable importance scores and associated testing procedures. For the test of Breiman and Cutler (2008), w...
متن کاملCarolin Strobl , Torsten Hothorn , Achim Zeileis Party on ! A New , Conditional Variable Importance Measure for Random Forests Available in the party Package
متن کامل
Random survival forests for high-dimensional data
Minimal depth is a dimensionless order statistic that measures the predictiveness of a variable in a survival tree. It can be used to select variables in high-dimensional problems using Random Survival Forests (RSF), a new extension of Breiman’s Random Forests (RF) to survival settings. We review this methodology and demonstrate its use in high-dimensional survival problems using a public domai...
متن کاملEstimation of Phosphorus Reduction from Wastewater by Artificial Neural Network, Random Forest and M5P Model Tree Approaches
This study aims to examine the ability of free floating aquatic plants to remove phosphorus and to predict the reduction of phosphorus from rice mill wastewater using soft computing techniques. A mesocosm study was conducted at the mill premises under normal conditions, and reliable results were obtained. Four aquatic plants, namely water hyacinth, water lettuce, salvinia, and duckweed were use...
متن کاملEstimation of Phosphorus Reduction from Wastewater by Artificial Neural Network, Random Forest and M5P Model Tree Approaches
This study aims to examine the ability of free floating aquatic plants to remove phosphorus and to predict the reduction of phosphorus from rice mill wastewater using soft computing techniques. A mesocosm study was conducted at the mill premises under normal conditions, and reliable results were obtained. Four aquatic plants, namely water hyacinth, water lettuce, salvinia, and duckweed were use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008